Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Automated extraction of chemical structure information from digital raster images

Identifieur interne : 000817 ( Main/Exploration ); précédent : 000816; suivant : 000818

Automated extraction of chemical structure information from digital raster images

Auteurs : Jungkap Park [États-Unis] ; Gus R. Rosania [États-Unis] ; Kerby A. Shedden [États-Unis] ; Mandee Nguyen [États-Unis] ; Naesung Lyu [États-Unis] ; Kazuhiro Saitou [États-Unis]

Source :

RBID : PMC:2648963

Abstract

Background

To search for chemical structures in research articles, diagrams or text representing molecules need to be translated to a standard chemical file format compatible with cheminformatic search engines. Nevertheless, chemical information contained in research articles is often referenced as analog diagrams of chemical structures embedded in digital raster images. To automate analog-to-digital conversion of chemical structure diagrams in scientific research articles, several software systems have been developed. But their algorithmic performance and utility in cheminformatic research have not been investigated.

Results

This paper aims to provide critical reviews for these systems and also report our recent development of ChemReader – a fully automated tool for extracting chemical structure diagrams in research articles and converting them into standard, searchable chemical file formats. Basic algorithms for recognizing lines and letters representing bonds and atoms in chemical structure diagrams can be independently run in sequence from a graphical user interface-and the algorithm parameters can be readily changed-to facilitate additional development specifically tailored to a chemical database annotation scheme. Compared with existing software programs such as OSRA, Kekule, and CLiDE, our results indicate that ChemReader outperforms other software systems on several sets of sample images from diverse sources in terms of the rate of correct outputs and the accuracy on extracting molecular substructure patterns.

Conclusion

The availability of ChemReader as a cheminformatic tool for extracting chemical structure information from digital raster images allows research and development groups to enrich their chemical structure databases by annotating the entries with published research articles. Based on its stable performance and high accuracy, ChemReader may be sufficiently accurate for annotating the chemical database with links to scientific research articles.


Url:
DOI: 10.1186/1752-153X-3-4
PubMed: 19196483
PubMed Central: 2648963


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Automated extraction of chemical structure information from digital raster images</title>
<author>
<name sortKey="Park, Jungkap" sort="Park, Jungkap" uniqKey="Park J" first="Jungkap" last="Park">Jungkap Park</name>
<affiliation>
<nlm:aff id="I1">Michigan Alliance for Cheminformatic Exploration</nlm:aff>
</affiliation>
<affiliation wicri:level="2">
<nlm:aff id="I2">Department of Mechanical Engineering, the University of Michigan, 2350 Hayward Street, Ann Arbor, MI 48109, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Mechanical Engineering, the University of Michigan, 2350 Hayward Street, Ann Arbor, MI 48109</wicri:regionArea>
<placeName>
<region type="state">Michigan</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Rosania, Gus R" sort="Rosania, Gus R" uniqKey="Rosania G" first="Gus R" last="Rosania">Gus R. Rosania</name>
<affiliation>
<nlm:aff id="I1">Michigan Alliance for Cheminformatic Exploration</nlm:aff>
</affiliation>
<affiliation wicri:level="2">
<nlm:aff id="I3">Department of Pharmaceutical Sciences, the University of Michigan College of Pharmacy, 428 Church Street, Ann Arbor, MI 48109, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Pharmaceutical Sciences, the University of Michigan College of Pharmacy, 428 Church Street, Ann Arbor, MI 48109</wicri:regionArea>
<placeName>
<region type="state">Michigan</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Shedden, Kerby A" sort="Shedden, Kerby A" uniqKey="Shedden K" first="Kerby A" last="Shedden">Kerby A. Shedden</name>
<affiliation>
<nlm:aff id="I1">Michigan Alliance for Cheminformatic Exploration</nlm:aff>
</affiliation>
<affiliation wicri:level="2">
<nlm:aff id="I4">Department of Statistics, the University of Michigan, 1085 South University, Ann Arbor, MI 48109, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Statistics, the University of Michigan, 1085 South University, Ann Arbor, MI 48109</wicri:regionArea>
<placeName>
<region type="state">Michigan</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Nguyen, Mandee" sort="Nguyen, Mandee" uniqKey="Nguyen M" first="Mandee" last="Nguyen">Mandee Nguyen</name>
<affiliation wicri:level="2">
<nlm:aff id="I3">Department of Pharmaceutical Sciences, the University of Michigan College of Pharmacy, 428 Church Street, Ann Arbor, MI 48109, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Pharmaceutical Sciences, the University of Michigan College of Pharmacy, 428 Church Street, Ann Arbor, MI 48109</wicri:regionArea>
<placeName>
<region type="state">Michigan</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Lyu, Naesung" sort="Lyu, Naesung" uniqKey="Lyu N" first="Naesung" last="Lyu">Naesung Lyu</name>
<affiliation wicri:level="2">
<nlm:aff id="I5">Ford Motor Company, 3104B, Advanced Engineering Center, 2400 Village Rd., Dearborn, MI 48121, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Ford Motor Company, 3104B, Advanced Engineering Center, 2400 Village Rd., Dearborn, MI 48121</wicri:regionArea>
<placeName>
<region type="state">Michigan</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Saitou, Kazuhiro" sort="Saitou, Kazuhiro" uniqKey="Saitou K" first="Kazuhiro" last="Saitou">Kazuhiro Saitou</name>
<affiliation>
<nlm:aff id="I1">Michigan Alliance for Cheminformatic Exploration</nlm:aff>
</affiliation>
<affiliation wicri:level="2">
<nlm:aff id="I2">Department of Mechanical Engineering, the University of Michigan, 2350 Hayward Street, Ann Arbor, MI 48109, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Mechanical Engineering, the University of Michigan, 2350 Hayward Street, Ann Arbor, MI 48109</wicri:regionArea>
<placeName>
<region type="state">Michigan</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">19196483</idno>
<idno type="pmc">2648963</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648963</idno>
<idno type="RBID">PMC:2648963</idno>
<idno type="doi">10.1186/1752-153X-3-4</idno>
<date when="2009">2009</date>
<idno type="wicri:Area/Pmc/Corpus">000065</idno>
<idno type="wicri:Area/Pmc/Curation">000065</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000163</idno>
<idno type="wicri:Area/Ncbi/Merge">000064</idno>
<idno type="wicri:Area/Ncbi/Curation">000064</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">000064</idno>
<idno type="wicri:Area/Main/Merge">000825</idno>
<idno type="wicri:Area/Main/Curation">000817</idno>
<idno type="wicri:Area/Main/Exploration">000817</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Automated extraction of chemical structure information from digital raster images</title>
<author>
<name sortKey="Park, Jungkap" sort="Park, Jungkap" uniqKey="Park J" first="Jungkap" last="Park">Jungkap Park</name>
<affiliation>
<nlm:aff id="I1">Michigan Alliance for Cheminformatic Exploration</nlm:aff>
</affiliation>
<affiliation wicri:level="2">
<nlm:aff id="I2">Department of Mechanical Engineering, the University of Michigan, 2350 Hayward Street, Ann Arbor, MI 48109, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Mechanical Engineering, the University of Michigan, 2350 Hayward Street, Ann Arbor, MI 48109</wicri:regionArea>
<placeName>
<region type="state">Michigan</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Rosania, Gus R" sort="Rosania, Gus R" uniqKey="Rosania G" first="Gus R" last="Rosania">Gus R. Rosania</name>
<affiliation>
<nlm:aff id="I1">Michigan Alliance for Cheminformatic Exploration</nlm:aff>
</affiliation>
<affiliation wicri:level="2">
<nlm:aff id="I3">Department of Pharmaceutical Sciences, the University of Michigan College of Pharmacy, 428 Church Street, Ann Arbor, MI 48109, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Pharmaceutical Sciences, the University of Michigan College of Pharmacy, 428 Church Street, Ann Arbor, MI 48109</wicri:regionArea>
<placeName>
<region type="state">Michigan</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Shedden, Kerby A" sort="Shedden, Kerby A" uniqKey="Shedden K" first="Kerby A" last="Shedden">Kerby A. Shedden</name>
<affiliation>
<nlm:aff id="I1">Michigan Alliance for Cheminformatic Exploration</nlm:aff>
</affiliation>
<affiliation wicri:level="2">
<nlm:aff id="I4">Department of Statistics, the University of Michigan, 1085 South University, Ann Arbor, MI 48109, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Statistics, the University of Michigan, 1085 South University, Ann Arbor, MI 48109</wicri:regionArea>
<placeName>
<region type="state">Michigan</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Nguyen, Mandee" sort="Nguyen, Mandee" uniqKey="Nguyen M" first="Mandee" last="Nguyen">Mandee Nguyen</name>
<affiliation wicri:level="2">
<nlm:aff id="I3">Department of Pharmaceutical Sciences, the University of Michigan College of Pharmacy, 428 Church Street, Ann Arbor, MI 48109, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Pharmaceutical Sciences, the University of Michigan College of Pharmacy, 428 Church Street, Ann Arbor, MI 48109</wicri:regionArea>
<placeName>
<region type="state">Michigan</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Lyu, Naesung" sort="Lyu, Naesung" uniqKey="Lyu N" first="Naesung" last="Lyu">Naesung Lyu</name>
<affiliation wicri:level="2">
<nlm:aff id="I5">Ford Motor Company, 3104B, Advanced Engineering Center, 2400 Village Rd., Dearborn, MI 48121, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Ford Motor Company, 3104B, Advanced Engineering Center, 2400 Village Rd., Dearborn, MI 48121</wicri:regionArea>
<placeName>
<region type="state">Michigan</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Saitou, Kazuhiro" sort="Saitou, Kazuhiro" uniqKey="Saitou K" first="Kazuhiro" last="Saitou">Kazuhiro Saitou</name>
<affiliation>
<nlm:aff id="I1">Michigan Alliance for Cheminformatic Exploration</nlm:aff>
</affiliation>
<affiliation wicri:level="2">
<nlm:aff id="I2">Department of Mechanical Engineering, the University of Michigan, 2350 Hayward Street, Ann Arbor, MI 48109, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Mechanical Engineering, the University of Michigan, 2350 Hayward Street, Ann Arbor, MI 48109</wicri:regionArea>
<placeName>
<region type="state">Michigan</region>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Chemistry Central Journal</title>
<idno type="eISSN">1752-153X</idno>
<imprint>
<date when="2009">2009</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>To search for chemical structures in research articles, diagrams or text representing molecules need to be translated to a standard chemical file format compatible with cheminformatic search engines. Nevertheless, chemical information contained in research articles is often referenced as analog diagrams of chemical structures embedded in digital raster images. To automate analog-to-digital conversion of chemical structure diagrams in scientific research articles, several software systems have been developed. But their algorithmic performance and utility in cheminformatic research have not been investigated.</p>
</sec>
<sec>
<title>Results</title>
<p>This paper aims to provide critical reviews for these systems and also report our recent development of ChemReader – a fully automated tool for extracting chemical structure diagrams in research articles and converting them into standard, searchable chemical file formats. Basic algorithms for recognizing lines and letters representing bonds and atoms in chemical structure diagrams can be independently run in sequence from a graphical user interface-and the algorithm parameters can be readily changed-to facilitate additional development specifically tailored to a chemical database annotation scheme. Compared with existing software programs such as OSRA, Kekule, and CLiDE, our results indicate that ChemReader outperforms other software systems on several sets of sample images from diverse sources in terms of the rate of correct outputs and the accuracy on extracting molecular substructure patterns.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>The availability of ChemReader as a cheminformatic tool for extracting chemical structure information from digital raster images allows research and development groups to enrich their chemical structure databases by annotating the entries with published research articles. Based on its stable performance and high accuracy, ChemReader may be sufficiently accurate for annotating the chemical database with links to scientific research articles.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
<region>
<li>Michigan</li>
</region>
</list>
<tree>
<country name="États-Unis">
<region name="Michigan">
<name sortKey="Park, Jungkap" sort="Park, Jungkap" uniqKey="Park J" first="Jungkap" last="Park">Jungkap Park</name>
</region>
<name sortKey="Lyu, Naesung" sort="Lyu, Naesung" uniqKey="Lyu N" first="Naesung" last="Lyu">Naesung Lyu</name>
<name sortKey="Nguyen, Mandee" sort="Nguyen, Mandee" uniqKey="Nguyen M" first="Mandee" last="Nguyen">Mandee Nguyen</name>
<name sortKey="Rosania, Gus R" sort="Rosania, Gus R" uniqKey="Rosania G" first="Gus R" last="Rosania">Gus R. Rosania</name>
<name sortKey="Saitou, Kazuhiro" sort="Saitou, Kazuhiro" uniqKey="Saitou K" first="Kazuhiro" last="Saitou">Kazuhiro Saitou</name>
<name sortKey="Shedden, Kerby A" sort="Shedden, Kerby A" uniqKey="Shedden K" first="Kerby A" last="Shedden">Kerby A. Shedden</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000817 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000817 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     PMC:2648963
   |texte=   Automated extraction of chemical structure information from digital raster images
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i   -Sk "pubmed:19196483" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd   \
       | NlmPubMed2Wicri -a OcrV1 

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024